The Experiment Dashboard
description
Transcript of The Experiment Dashboard
CCRC08 - Measuring our progress
The Experiment DashboardISGC 20089-11th April 2008Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko, Gerhild Maier, Ricardo Rocha, Irina Sidirova IT-GS-MND
CERN IT DepartmentCH-1211 Geneva 23Switzerlandwww.cern.ch/it
CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
1OverviewDashboard structureDashboard in productionJob MonitoringGrid reliabilityProdsysData ManagementSAM FTS monitoringSite status boardFuture developmentConclusions
ISGC 2008 -- [email protected] 2CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Dashboard FrameworkWeb / HTTP InterfaceData Access Layer (DAO)Agents
Oracle DB
DB reading and writing via DAO layerConnection poolingEasy to add interface for a different backendCollectors of informationCommon configuration and managementMultiple clients: cli, web Multiple output formats: plain text, csv, xml, xhtmlISGC 2008 -- [email protected] 3CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Transfer monitoring for ALICEData management monitoring for ATLASProduction monitoring for ATLAS and CMS(prototypes)IO rate monitoring between WN and SE (prototype)Site availabilitybased on theresults of SAM testsJob Robot monitoringAccounting information from Apel and Gratia for ATLAS (prototype)Task monitoring for CMS analysis users (ATLAS on the way)Job monitoringSite reliabilityExperiment DashboardCOMMON applicationsALICE, ATLAS, CMS, LHCb,Vlemed
CMSIntegration and commissioning Experiment specific applicationsDashboard activitiesISGC 2008 -- [email protected] 4CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Job MonitoringDisplay all the jobs submitted by a VOFollow the status of the jobs
Collect information from different sourcesRGMA, IC Real Time Monitor, BDII, MonALISA,
Very useful for VO managers, site admin, users
Possibility to get the output in different formats
Deployed for ALICE, ATLAS, CMS, LHCb and VleMed
ISGC 2008 -- [email protected] 5CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Job Monitoring
ISGC 2008 -- [email protected] 6CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Job Monitoring
ISGC 2008 -- [email protected] 7CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Site ReliabilityEfficiency of the different sitesJobs and Job Attempts
List of most common errorsAnd recipes to the solutions!!
Generic application
Automatic generation of monthly reports
ISGC 2008 -- [email protected] 8CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Site reliability
ISGC 2008 -- [email protected] 9CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Production SystemATLAS Prodsys
Identify failing tasks and jobs
Evaluate the performance of the sites
Daily/weekly/monthly statistics
User guide
ISGC 2008 -- [email protected] 10CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Production SystemISGC 2008 -- [email protected] 11CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Production SystemISGC 2008 -- [email protected] 12CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Data ManagementMonitor of T0 and Production system
Report of transfers to the different sites
Integrated with the ATLAS management system
Information of the clouds, sites, SE and datasets
History of errorsISGC 2008 -- [email protected] 13CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Data Management
ISGC 2008 -- [email protected] 14CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Data Management
ISGC 2008 -- [email protected] 15CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
FTS reliabilityDaily report on the success of transfers
Drill down list of errors
Integrated in the ALICE environment
Extremely useful during the different ALICE challenges: PDC06, PDC07, CRC08
Working on making it genericISGC 2008 -- [email protected] 16CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
FTS reliability
ISGC 2008 -- [email protected] 17CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
SAM monitoringService Availability Monitoring
Clickable plots to drill down:Site availability Service availability Service tests
Links to the SAM results
At the moment, only for CMSATLAS requested a similar interfaceOngoing work to make it genericISGC 2008 -- [email protected] 18CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
SAM monitoringISGC 2008 -- [email protected] 19CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
SAM monitoring
ISGC 2008 -- [email protected] 20CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Site Status BoardTable with status of the different sites for CMS
Easy definition of new metricsThe metrics can come from different sources
Links to more detailed information
At the moment, deployed for CMSIt could be used by other VO
Working on providing historyAnd aggregationISGC 2008 -- [email protected] 21CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Site Status Board
ISGC 2008 -- [email protected] 22CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Site Status Board
ISGC 2008 -- [email protected] 23CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
Experiment Dashboard plansInclude more data sources: condor_g, L&B, Security: X509 authentication
New application:Pilot jobsInput collections
Improve existing applicationsMake the SAM interface genericMore in depth failure analysisUser requests and suggestions
Integration with the GridMap technology ISGC 2008 -- [email protected] 24CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
ConclusionsThe Experiment Dashboard provides:Several monitor applicationsIntegration of information from different sourcesMultiple output format: html, xml, csv, txt..
Generic appliations:Job Monitoring, Grid reliabilityExperiment specificDDM, ProdSys, Site Status Board, SAM,
Used in production by multiple VO
User, installation and developer guides
http://dashboard.cern.chISGC 2008 -- [email protected] 25CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices
CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices