David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5:...
-
Upload
anna-burns -
Category
Documents
-
view
215 -
download
0
Transcript of David Adams ATLAS ATLAS Distributed Analysis David Adams BNL September 30, 2004 CHEP2004 Track 5:...
David Adams
ATLAS
ATLAS Distributed Analysis
David AdamsBNL
September 30, 2004
CHEP2004Track 5: Distributed Computing Systems and Experiences
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 2David Adams
ATLAS
Contents
Goals
Key concepts• Datasets
• Transformations
• Jobs
• AJDL
Service architecture
Analysis services• DIAL
• ATPROD
• ARDA
Catalog services
Data management services
Clients
Status
ARDA
Conclusions
Contributors
More information
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 3David Adams
ATLAS
GoalsProvide to globally distributed users:
• Access to globally distributed data that is– Comprehensible– Enables selection of relevant data– Enables sensible placement of data
• Means to perform globally distributed processing on this data– High-level view that hides details of underlying middleware– But enables monitoring and debugging– Automatic, complete and accurate provenance
All the above must be easy to use• Well-integrated with analysis environments
– Root, python, etc.
• Graphical views where appropriate– Browse and examine data,– Monitor jobs, …
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 4David Adams
ATLAS
Key conceptsDataset
• Describes a collection of data– E.g. a collection of reconstructed events,
– A collection of histograms, …
Transformation• Defines an operation to be performed on the data
• Dataset Dataset
• Application + task (user configuration of application)
Job• Instance of a transformation
• Typical user request processed as a collection of sub-jobs– Same transformation acting on sub-datasets
– Plus dataset splitting of input and merging of output
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 5David Adams
ATLAS
Key concepts (cont)
D atase t 1 D atase t 2
D atase t
U se r a n a ly sisfra m e w o rk
A p p lic a tio n T a sk
R e sult C od e
7 . c re a te
4 . s e le c t
2 . s e le c t 3 . c re a te o r s e le c t
A n alys isS ervice
1 . c re a te o r lo c a te5 . s u b m it(a p p ,ts k ,d s )
R e sult 1
R e sult 2
Jo b 1
Jo b 2
8 . ru n(a p p ,ts k ,d s 1 )
8 . ru n(a p p ,ts k ,d s 2 )
9 . fill
9 . fill
1 0 . ga the r
6 . s p lit
R O O T ,G AN G A, . . .
E v en t d a ta ,s u m m ar y d a ta ,tu p les , . .
Ath en a , d ia lp aw ,R O O T , . . .
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 6David Adams
ATLAS
DatasetsDataset includes
• Identifier
• Location of data, e.g. list of logical files– Absent for virtual datasets
• Content (i.e. description of the content)– E.g. list of event ID’s and the type of data for each event
– Or a list of histogram names
• List of constituent datasets– Usually their ID’s
– When dataset is composite, access to location and content may require use of the constituent datasets
Dataset selection catalog holds metadata
Dataset replica catalog holds replica mapping• 1 Virtual N concrete dataset mapping
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 7David Adams
ATLAS
Datasets (cont)For ATLAS data, we identify
• Types of data– Used to define dataset categories
– Category will be part of the content specification
• Types of datasets– Currently C++ classes with XML data representation
– Third column indicates if this class exists
– Likely will move to XML schema as the primary definition
• See table
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 8David Adams
ATLAS
Datasets (cont)Name Type ? Description
EVIDS EventDataset × List of event ID’s
EVGEN AtlasPoolEventDataset × From event generator
HITS AtlasPoolEventDataset × Hits, e.g. from GEANT
DIGITS AtlasPoolEventDataset × Digitization of hits
RAW AtlasByteStreamEventDataset Raw data
ESD AtlasPoolEventDataset × Event summary data
AOD AtlasPoolEventDataset × Analysis oriented data
TAG AtlasPoolTagEventDataset Event metadata
NTUP RootNtupleDataset Ntuples
HISTO RootHistogramDataset × Histograms
CBNT CbntDataset × DC1 combined ntuples
TEXT TextDataset Text data, e.g. log files
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 9David Adams
ATLAS
TransformationsTransformation
• Describes an operation to act on a dataset to produce a new dataset
• Has two components– Application = code shared by multiple transformations
> Usually scripts to locate and run code in software packages
– Task = user-supplied configuration (parameters or code)
Task• List of files
– Presently embedded in task
– Later could also be logical files
• Named parameters– Add this soon
• Typically created by user submitting the job
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 10David Adams
ATLAS
Transformations (cont)Application
• Two entry points (presently scripts)– Build_task to fetch task files, compile, etc
– Run creates output dataset from input dataset and built task
• Typically created by application developer
Software package management• Need an interface to enable build_task and run scripts to locate
software on any machine
• E.g. “locate mypkg 1.2.3” returns /usr/contrib/mypkg/1.2.3/rh73_gcc73
• Also support querying and installation
• Implement as thin layer on existing package management systems– Pacman, RPM, local build, …
• Use service to handle installation and removal of packages
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 11David Adams
ATLAS
Transformations (cont)IN \ O UT EVT ID S EVG EN HIT S DIG ITS RA W E S D A O D TA G NTUP HIS TO
ID BLD D AQ
EVT ID S G EN
EVG EN G 4SIM G 4SIM G 4SIM
HIT S D IG I D IG I D IG I
D IG IT S PAC K R EC O R EC O R EC O
R AW UNPAC K
ESD AO D BLD
AO D SELEC T T AG BLD ANALYZE ANALYZE
T AG SELEC T
NT UP ANALYZE ANALYZE
For ATLAS we identify the above transformations• Characterized by input and output dataset categories
• Most common ones listed—others are possible
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 12David Adams
ATLAS
JobsA job is an instance of a transformation acting on a dataset
• Output result is another dataset
• Partial result may be available before job is complete
Typical user-submitted job is split into sub-jobs• By splitting input dataset and applying the same transformation to
each sub-dataset
• Strategies for splitting and merging results must be provided
Provenance• Dataset provenance is specified by recording the input dataset and
transformation
• More complete information is available from the job:– Site, CPU, submission, start and stop times, …
– Log files maintained for some period, perhaps as datasets
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 13David Adams
ATLAS
AJDLAJDL = Abstract Job Definition Language
Components are representations of• Dataset
• Transformation = Application + Task
• Job
• JobPreferences
• File
• Identifiers for all the above
Presently defined as C++ classes• With methods to write to and read from XML
– Different for each subclass of Dataset
– Same for subclasses of Job
• XML specified in DTD files
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 14David Adams
ATLAS
AJDL (cont)Look at moving to XML schema
• Automatically derive classes from XML definitions– Automatic support for other languages (python, java, …)
• In collaboration with GANGA and others
At the same time• Try to find one representation for all datasets
• Introduce separate type for event ID lists– Often too large to carry around in a dataset
Also interested in specifying interfaces for AJDL services• Those that operate on AJDL components
• Services listed later
Interested in working with others on these specifications
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 15David Adams
ATLAS
Service architectureADA itself is distributed
• Allows data access and job management to be distributed– Important for scaling to a large number of users
• Collection of web services– Analysis service for job processing
– Job monitoring
– Catalog services
> Metadata
> Repository
> Replica (not only for files)
• Users interact through clients– Root client from DIAL
– Python client from GANGA
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 16David Adams
ATLAS
Service architecture
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
G U I andc o m m and l inec l ie nts
H igh le ve l s e rvic e sfo r c atalo ging andjo b s ubm is s io n andm o nito r ing
W o rklo adm anage m e nts ys te m s
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 17David Adams
ATLAS
DIAL analysis serviceTwo instances running at BNL
• Long running jobs using condor job submission
• Interactive response using fast LSF queue
Working to improve interactive response• Submit jobs to perform result merging
– Presently done on service host
• Use parallel jobs for merging
• Long term, look at the use of job agents– Possibly as part of ARDA
Add service to act as switch• Delegate jobs based on
– Job requirements
– Desired response time
– Resource availability
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 18David Adams
ATLAS
ATPROD analysis serviceEnable submission to the existing ATLAS production system
• At least for user-level production
Strategy• Split input dataset
• Make an entry in the production catalog for each sub-job
• Monitor catalog and gather and merge results as jobs finish
• Same for the other analysis services
Not yet implementedR O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 19David Adams
ATLAS
ARDA analysis serviceEnable submission to the gLite WMS
• Let EGEE do the work of matchmaking, brokering, job tracking, monitoring, error reporting, …
There is a service to submit to the existing prototype system
Expect first release of GLite next month• Quickly deploy an analysis service
based on this
• Make regular updates taking advantage
of more gLite features R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 20David Adams
ATLAS
Catalog servicesGoals of ADA cataloging:
• Provide a repository for AJDL objects indexed by ID– Insert at site A and extract with ID at site B
• Enable users to assign metadata to objects and retrieve with queries
• Record dataset provenance
• Provide job monitoring
Identify three types of catalogs• Repository
– Map ID to XML string
• Metadata catalog– Map ID to named attributes
• Replica catalog– Map ID to a list of ID’s
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 21David Adams
ATLAS
Catalog services (cont)Required global catalog instances
• Repositories for Dataset, Application, Task, Job
• Metadata catalog for Dataset– Same as that used for production?
• Replica catalog for Dataset
• More later
• First choice is to host these in AMI (soon)
Next add local job catalog to record analysis service state• So service can be restarted without losing jobs
Later look at issues such as• Distributed cataloging
• Private catalogs
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 22David Adams
ATLAS
Data management servicesDQ (Don Quijote) was developed as part of production
• Provides access to file replica catalogs from all three grids
• Enables file movement including between grids
• ADA will adopt this for replica management and movement
ATLAS has plan to add a file transfer service• Adopt this as well when available
SRM provides file management at the site level• ATLAS expects sites to deploy this service
• DQ and ADA will use this as it is deployed
GLite has a suite of data management services• Including SRM
• Rest of service model is complex—hide it behind DQ– Already have DQ interface to AlieEn file catalog
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 23David Adams
ATLAS
ClientsDIAL provides a ROOT client
• ACLiC used to build dictionaries for DIAL classes– All DIAL classes available on the ROOT command line
– Enables catalog browsing, job submission, monitoring, etc.
GANGA provides a python client• PyLCGDict used to build python wrappers for DIAL classes
– All DIAL classes available on the python command line
• Later build python-only client– Restricted functionality but
– Greater portability
GUI• GANGA is developing a GUI
– Data browsing
– Configure, submit and monitor jobs
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 24David Adams
ATLAS
StatusPresent system includes
• Root and Python command line clients
• DIAL analysis services running– Interactive service at BNL
– Batch service at BNL
• Datasets– Classes for combined ntuples, ATLAS-POOL event collections
– All DC1 CBNT data
– Few DC2 samples
• Transformations– DC1 CBNT histograms
– DIGI: atlasdigi-8.5.0
– RECO: atlas-reco-8.x.0. x= 3, 4, 5
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 25David Adams
ATLAS
ARDAATLAS-ARDA prototype
• ARDA is a CERN project to deliver prototype distributed analysis systems for the LHC experiments
– Based on gLite (EGEE middleware)
• The ATLAS ARDA prototype makes use of the components shown in the figure
• Expect functional system this year
R O O T P Y T H O N
A M I D B S D IA L A S A T P R O D A S A R D A A S
LS F , C O N D O R gLite W M SA T P R O D
AJ D L
s h S Q L g L ite
AM I w s
AJ D L
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 26David Adams
ATLAS
ConclusionsStatus
• ADA is coming together but there is still much to do• Still in demo mode; for serious use we must add
– Dataset description of DC2 data– Repositories for applications, tasks, datasets and jobs in AMI– Dataset selection catalog in AMI– Dataset replica catalogs in AMI– Transformations for the full DC2 production/analysis chain– Means to move output data to a storage element
• Expect all this year
Future developments (beyond those above)• Update AJDL moving to XML schema and adding WSDL• GUI (expect this soon)• ATPROD service to access more compute resources• ARDA service to try out EGEE middleware• Improvements to DIAL service to improve interactive response
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 27David Adams
ATLAS
ContributorsDIAL
• D. Adams, W. Deng, V. Sambamurthy, N. Chetan, C. Kannan
GANGA• K. Harrison, C. Tan, A. Soroko
ARDA• D. Liko, F. Orellana
AMI• S. Albrand, J. Fulachier
ATLAS• C. Haeberli, J. Bahilo, F. Fassi, G. Rybkine, M. Branco
Many useful discussions• All the above and PPDG, GAG, gLite,…
CHEP2004 Atlas Distributed Analysis Sept 30, 2004 28David Adams
ATLAS
More informationFor more information on ADA, see the home page
http://www.usatlas.bnl.gov/ADA
Includes status of subprojects, relevant talks and documents, and links to associated projects
To try it out, run root demo 3 in the latest DIAL releasehttp://www.usatlas.bnl.gov/~dladams/dial/releases/0.92
See the ADA paper in the CHEP2004 proceedings