The Experiment Dashboard

download The Experiment Dashboard

If you can't read please download the document

description

The Experiment Dashboard. ISGC 2008 9-11 th April 2008 Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko, Gerhild Maier, Ricardo Rocha, Irina Sidirova IT-GS-MND. Overview. Dashboard structure Dashboard in production Job Monitoring Grid reliability - PowerPoint PPT Presentation

Transcript of The Experiment Dashboard

CCRC08 - Measuring our progress

The Experiment DashboardISGC 20089-11th April 2008Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko, Gerhild Maier, Ricardo Rocha, Irina Sidirova IT-GS-MND

CERN IT DepartmentCH-1211 Geneva 23Switzerlandwww.cern.ch/it

CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

1OverviewDashboard structureDashboard in productionJob MonitoringGrid reliabilityProdsysData ManagementSAM FTS monitoringSite status boardFuture developmentConclusions

ISGC 2008 -- [email protected] 2CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Dashboard FrameworkWeb / HTTP InterfaceData Access Layer (DAO)Agents

Oracle DB

DB reading and writing via DAO layerConnection poolingEasy to add interface for a different backendCollectors of informationCommon configuration and managementMultiple clients: cli, web Multiple output formats: plain text, csv, xml, xhtmlISGC 2008 -- [email protected] 3CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Transfer monitoring for ALICEData management monitoring for ATLASProduction monitoring for ATLAS and CMS(prototypes)IO rate monitoring between WN and SE (prototype)Site availabilitybased on theresults of SAM testsJob Robot monitoringAccounting information from Apel and Gratia for ATLAS (prototype)Task monitoring for CMS analysis users (ATLAS on the way)Job monitoringSite reliabilityExperiment DashboardCOMMON applicationsALICE, ATLAS, CMS, LHCb,Vlemed

CMSIntegration and commissioning Experiment specific applicationsDashboard activitiesISGC 2008 -- [email protected] 4CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Job MonitoringDisplay all the jobs submitted by a VOFollow the status of the jobs

Collect information from different sourcesRGMA, IC Real Time Monitor, BDII, MonALISA,

Very useful for VO managers, site admin, users

Possibility to get the output in different formats

Deployed for ALICE, ATLAS, CMS, LHCb and VleMed

ISGC 2008 -- [email protected] 5CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Job Monitoring

ISGC 2008 -- [email protected] 6CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Job Monitoring

ISGC 2008 -- [email protected] 7CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Site ReliabilityEfficiency of the different sitesJobs and Job Attempts

List of most common errorsAnd recipes to the solutions!!

Generic application

Automatic generation of monthly reports

ISGC 2008 -- [email protected] 8CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Site reliability

ISGC 2008 -- [email protected] 9CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Production SystemATLAS Prodsys

Identify failing tasks and jobs

Evaluate the performance of the sites

Daily/weekly/monthly statistics

User guide

ISGC 2008 -- [email protected] 10CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Production SystemISGC 2008 -- [email protected] 11CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Production SystemISGC 2008 -- [email protected] 12CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Data ManagementMonitor of T0 and Production system

Report of transfers to the different sites

Integrated with the ATLAS management system

Information of the clouds, sites, SE and datasets

History of errorsISGC 2008 -- [email protected] 13CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Data Management

ISGC 2008 -- [email protected] 14CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Data Management

ISGC 2008 -- [email protected] 15CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

FTS reliabilityDaily report on the success of transfers

Drill down list of errors

Integrated in the ALICE environment

Extremely useful during the different ALICE challenges: PDC06, PDC07, CRC08

Working on making it genericISGC 2008 -- [email protected] 16CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

FTS reliability

ISGC 2008 -- [email protected] 17CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

SAM monitoringService Availability Monitoring

Clickable plots to drill down:Site availability Service availability Service tests

Links to the SAM results

At the moment, only for CMSATLAS requested a similar interfaceOngoing work to make it genericISGC 2008 -- [email protected] 18CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

SAM monitoringISGC 2008 -- [email protected] 19CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

SAM monitoring

ISGC 2008 -- [email protected] 20CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Site Status BoardTable with status of the different sites for CMS

Easy definition of new metricsThe metrics can come from different sources

Links to more detailed information

At the moment, deployed for CMSIt could be used by other VO

Working on providing historyAnd aggregationISGC 2008 -- [email protected] 21CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Site Status Board

ISGC 2008 -- [email protected] 22CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Site Status Board

ISGC 2008 -- [email protected] 23CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

Experiment Dashboard plansInclude more data sources: condor_g, L&B, Security: X509 authentication

New application:Pilot jobsInput collections

Improve existing applicationsMake the SAM interface genericMore in depth failure analysisUser requests and suggestions

Integration with the GridMap technology ISGC 2008 -- [email protected] 24CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

ConclusionsThe Experiment Dashboard provides:Several monitor applicationsIntegration of information from different sourcesMultiple output format: html, xml, csv, txt..

Generic appliations:Job Monitoring, Grid reliabilityExperiment specificDDM, ProdSys, Site Status Board, SAM,

Used in production by multiple VO

User, installation and developer guides

http://dashboard.cern.chISGC 2008 -- [email protected] 25CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices

CERN IT DepartmentCH-1211 Genve 23Switzerlandwww.cern.ch/itInternetServices