1
Bridging Clouds with CernVM:
ATLAS/PanDA example
Wenjing Wu2010-8-27
2
Outline
ATLAS computing model (PanDA)
Extending ATLAS computing model to use Cloud computing resources
Challenges
Solution
Work Done
3
1.Submit jobs to PanDA server
2.Pilots are submitted to work nodes
3.Pilot checks environment, fetch jobs from PanDA server
Storage ElementLogical File
Catalog
4.Pilot upload and register output files after job done
5.Pilot updates job status to PanDA server
6. PanDA server managers the final data transfer
PanDA - the Production and Distributed Analysis system for the ATLAS Experiment
4
Extending ATLAS computing model to use Cloud Computing
resources What are Clouds (in nowadays common terms)?
Virtualized computing resources provided by academic and commercial institutions (e.g. CERN lxcloud, Amazon EC2)
The resources provided by users participating in volunteer computing projects (e.g. BOINC)
The goal:
Run ATLAS production jobs on Cloud Computing resources.
5
Challenges!Transparency: users and production operators should not notice the difference
The whole set of Cloud resources should appear to PanDA server as just another Grid site
Credentials (which are essential for the functioning of PanDA pilot) can not be brought into the ‘untrusted’ environment (e.g. to the machines of the volunteers)
6
Solve the challenge using CernVM
CernVMProvides a lightweight virtual machine
image containing the applications of LHC experiments
The application software is distributed through HTTP based content delivery network and is cached locally
Provides Co-Pilot: a framework for the delivery and execution of the workload on remote virtual machines
7
Co-Pilot Job Manager
Co-Pilot Storage Manager
Storage ElementLogical File
Catalog
Co-Pilot Client
1. submit PanDA job
2. submit Co-Pilot job
3. Agent get a Co-Pilot job which launches the PanDA pilot
4. Pilot fetch PanDA job and runs it
5. uploads output to temporary storage after job finished
6. uploads and register output files
7 update job final status to PanDA server
Cloud resources provided through
VMs running Co-Pilot Agent
CernVM Co-Pilot
Integration!
8
WorkDone (1)Setup CERNVM site (part of ATLAS Grid infrastructure)
Is a dynamic virtual cluster formed by virtual machines running CernVM Co-Pilot Agents
Is configured according to ATLAS computing conventions
Appears to ATLAS Grid central services as a Tier 2 site
9
Work Done(2)Adaptation of PanDA Pilot:
Adding support for the heterogeneous structure of the software repository
Adding support for saving job output metadata and job status files
Development of Co-Pilot Storage Manager
A component running in the trusted environment and acting as a proxy between Co-Pilot agents and PanDA Grid services
10
11
Thanks!
12
Solve the challenge using CernVM
CernVM Co-Pilot is to help to run ATLAS PanDA job in a non-credentialed computing environment.
CernVM Co-Pilot Components:
Co-Pilot client: submit jobs to Co-Pilot JobManager
Co-Pilot Server:
Co-Pilot Job Manager: dispatch jobs to Co-Pilot Agents
Co-Pilot Storage sManager: upload /register output files, change job status with credential
Co-Pilot Agent: runs the jobs on non-credentialed computer nodes
13
Ingredients
CernVM
Provides an ultralight image for different hyper-visors
ATLAS software is distributed by CVMFS, cached locally
Co-Pilot
Co-Pilot Agent is distributed with CernVM image
schedule jobs to CernVM virtual clusters
14
Co-Pilot Storage Manager
How CoPilot SM(Storage Manager) works?receives “JobDone” message from Co-Pilot agent(JobID is included)
SM calls the Co-Pilot_Data_Mover which extracts metadata of job output from pilot log, upload files to designated SE and register them to designated LFC catalog
SM verify the status of file uploading and registration
SM calls Co-Pilot_Job_Status_Updater which update the status to PanDA server(finished or failed)
Both Co-Pilot_Data_Mover and Co-Pilot_Job_Status_Updater are python scripts using libraries from pilot source code
Top Related