1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5...
-
Upload
sylvia-thompson -
Category
Documents
-
view
212 -
download
0
Transcript of 1 DØ Grid PP Plans – SAM, Grid, Ceiling Wax and Things Iain Bertram Lancaster University Monday 5...
1
DØ Grid PP Plans –SAM, Grid, Ceiling Wax and Things
Iain Bertram
Lancaster University
Monday 5 November 2001
5/11/2001 Iain A Bertram - Lancaster 2
SAM, DØ, and the Grid
DØ BasicsWhat is SAM?
HistoryCurrent DeploymentCollaborators
SAM, Grid and Future DevelopmentsOverview of PlansUK PlansCDF and SAM
5/11/2001 Iain A Bertram - Lancaster 3
The DØ Experiment
Detector Data 1,000,000 Channels Event size 250KB Event rate ~50 Hz On-line Data Rate
12 MBps Est. 2 year totals (incl
Processing and analysis):
1 x 109 events ~0.5 PB
Monte Carlo Data 5 remote processing
centers Estimate ~300 TB in 2
years.
5/11/2001 Iain A Bertram - Lancaster 4
Collaboration
~500 Physicists 72 institutions 18 Countries
5/11/2001 Iain A Bertram - Lancaster 5
5/11/2001 Iain A Bertram - Lancaster 6
SAM (DØ & FNAL Project)
SAM is Sequential Access to data via Meta-data Project started in 1997 to handle DØ ’s needs for Run
II data system. SAM is a data-grid
No fully functional Grid currently exists SAM does have many GRID functionalities
Stations – logical collection of computers, networks, storage
Transparent access and transport of data between stations Data Cataloguing – Replica Management Fabric management Job Submission on Local Station Only
http://d0db.fnal.gov/sam
5/11/2001 Iain A Bertram - Lancaster 7
Deployment
5/11/2001 Iain A Bertram - Lancaster 8
Deployment II
CentralAnalysis
Interconnected network of primary cache stationsCommunicating and replicating data where it is needed.
MSS MSS
MSS
WAN
Stations at FNALCurrent active stations •FNAL (several)•Lyon FR (IN2P3), •Amsterdam NL (NIKHEF)•Lancaster UK•Imperial College UK•Others in US
Datalogger
Reco-farm
ClueD0
LAN
(Others)
5/11/2001 Iain A Bertram - Lancaster 9
Statistics
Number of registered users: 360Data in the system: 25 TB
160k filesAcessing > 3 TB
a dayGoal 13 TB/day
Fully integrated into DØ Analysis Framewrok
25TB
5/11/2001 Iain A Bertram - Lancaster 10
SAM Collaborators
PPDG - Particle Physics Data GridDØ Participation
Condor Globus Fermilab CMS Computing Group iVDGL International Virtual Data-Grid
Laboratory.
IGMB – InterGrid Management Board
5/11/2001 Iain A Bertram - Lancaster 11
Future Plans
SAM is an operational GRID Ideal platform for demonstrating Grid
technologies on the time scale of 2 years
Modular design allows integration of modern Grid Tools
Ideal Testing Ground for LHC scale experiments Full Scale Test of Grid Middleware
5/11/2001 Iain A Bertram - Lancaster 12
DØ Goals
DØ is fully committed to making SAM a fully functional GRID on the timescale of 2 years.
DØ is committed to using standard GRID tools wherever possible.
DØ has/is committing significant resources to the GRID.
5/11/2001 Iain A Bertram - Lancaster 13
General Goals
1. Use of standard middleware to promote interoperability1. Globus Security infrastructure, Interoperability
with Fermilab Kerberos security infrastructure 2. GridFTP as one of the supported file transfer
protocols3. Globus job submission.4. Condor and extensions job submission5. Publish availability and status of SAM station
resources 6. Publish catalog of data files and their replicas
using standard or standards emerging from PPDG and DataGrid
5/11/2001 Iain A Bertram - Lancaster 14
General Goals II
2. Additional Grid functionality for Job specification, submission and tracking.1. Use of full Condor services for migration and
checkpointing of jobs – as much as is possible with DØ software and the DØ software framework. This may require work on both Condor software to achieve full functionality
2. Building incrementally enhanced Job specification language and job submission services that ensure co-location of job execution and data files and reliably execute a chain of job processing, with dependencies between job steps. The first step in this is expected to be work in conjunction with the Condor team to provide for specification and execution of a Directed Acyclic Graph of jobs using an extended version of the DAGMAN product that CMS is testing for their MC job execution.
5/11/2001 Iain A Bertram - Lancaster 15
General Goals III
3. Enhancing Monitoring and Diagnostic capabilities1. Extensions to existing system of logging
all activities in the system to both local and central log files - as demanded by robustness and increased use of system.
2. Incorporation of emerging Grid Monitoring Architecture and monitoring tools. Little exists on this at this point and this work will involve working with other Grid projects and participating in Global Grid Forum working groups
5/11/2001 Iain A Bertram - Lancaster 16
Proposed Applications
Monte Carlo Production SystemTo upgrade the distributed Monte Carlo
production system as a short term use case for demonstrating, in an incremental fashion, essential components of the Grid. In particular this involves demonstrating transparent job submission to a number of DØ processing centres (SAM stations), reliable execution of jobs, transparent data access and data storage and an interface for users to understand and monitor the state of their MC requests and the resultant MC jobs and data that satisfy their requests
5/11/2001 Iain A Bertram - Lancaster 17
Proposed Applications
General User ApplicationsTo demonstrate analysis of data products
(both MC and Detector Data) on desktop systems in the UK, using an enhanced version of the DØ SAM Grid system that incrementally incorporates Grid middleware components. This will not only demonstrate active use of a Grid for analysis but will also eventually demonstrate interoperability between the DØ Grid and other emerging Grid testbeds.
5/11/2001 Iain A Bertram - Lancaster 18
MilestonesMonth 6 Integration of Globus Security infrastructure and GridFTP
into SAM and deployment at several UK stations interoperating with Fermilab and other SAM stations.
First demonstration of MC production system using Request interface and automated job submission to one SAM station with limited intelligence in job distribution and load balancing.
Month 12 Fully commissioned MC Production System with reliable execution of jobs, splitting into sub-jobs as necessary, intelligent job distribution and load balancing, taking into account the economics of data movement versus job movement.
Month 24 A fully robust production quality system, excellent monitoring and interoperability with other Grid projects and EU DataGrid with sharing of some resources. Updated Fabric capable of handing the data on each experiment.
5/11/2001 Iain A Bertram - Lancaster 19
CDF and SAM
Rick St Denis TalkCDF to use SAM for data accessUK setting up test facilitiesCombined DØ and CDF Proposal
5/11/2001 Iain A Bertram - Lancaster 20
Proposal
Request 6 FTEsFour of the additional FTEs will work on
integrating CORE Grid functionality into SAM (this talk)
One FTE for DØ applications (MC and Analysis)
One FTE integrate CDF software with SAM
5/11/2001 Iain A Bertram - Lancaster 21
Conclusions
SAM is an operational GRID
Offers great opportunity for testing GRID Middleware
Goal: Fully functional test-bed in two years.