Global ADC Job Monitoring Laura Sargsyan (YerPhI).
-
Upload
pamela-tyler -
Category
Documents
-
view
218 -
download
0
description
Transcript of Global ADC Job Monitoring Laura Sargsyan (YerPhI).
![Page 1: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/1.jpg)
Global ADC Job Monitoring
Laura Sargsyan (YerPhI)
![Page 2: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/2.jpg)
30/11/2010 ATLAS Software & Computing Workshop
2
Motivation
Provide an overview of job processing in scope of ATLAS regardless of the submission tools and execution backends
Adapt CMS Job monitoring for the analysis users, preserving the content but improving visualization.
![Page 3: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/3.jpg)
30/11/2010 ATLAS Software & Computing Workshop
3
Architecture of Dashboard Job monitoring
Dashboard JobRepository
(Dashboard DB)
PanDA DB
Instrumented GANGA UI
Instrumented jobs
MSG(ActiveMQ)
Dashboard Collector
Historical view UI
Job summary UI
Analysis JobMonitoring UI
![Page 4: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/4.jpg)
30/11/2010 ATLAS Software & Computing Workshop
4
Main components (1)
Dashboard GANGA reporting: GANGA plugin, publishes master/subjobs statuses, meta information to MSG
MSG: published and consumed messages
Data collectors msg-consume2db: listens to messages from MSG PanDA collector: retrieves data from PanDA DB
Watchdog scripts: scheduled procedures that send alarms by SMS, e-mail in case of problem
DB scheduled services collector alarms cronjob scripts
![Page 5: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/5.jpg)
30/11/2010 ATLAS Software & Computing Workshop
5
Data repository: DB Triggers populate data from GANGA job reporting and PanDA
DB into the database tables
Web Application layer: responsible for the HTTP entry point to the data and exposes them in different formats (JSON, XML, CSV)
User interfaces Provides user centric view;
Main components (2)
![Page 6: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/6.jpg)
30/11/2010 ATLAS Software & Computing Workshop
6
Implementation
Importing PanDA data into the schema, which is validated and tuned for monitoring purposes
Instrumentation of GANGA jobs for MSG reporting, submitted via WMS, Local submission, CREAM CE
Data from MSG is collected in the monitoring repository
As a result all ATLAS job monitoring data, both for analyses and production is collected in common monitoring schema
Setup aggregated procedures for data accounting
Adapting CMS dashboard interactive and accounting user interfaces for ATLAS(adding sorting and filtering by cloud)
![Page 7: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/7.jpg)
30/11/2010 ATLAS Software & Computing Workshop
7
Job Summary (1)
Interactive view : What is going on now regarding job processing in the scope of ATLAS
Aimed at different types of users:•individual scientists using the Grid for data analysis, user support teams, site admins, VO managers, managers of different computing projects.
Job Summary enables very flexible access to recent monitoring data and shows the job processing of a VO at run-time
![Page 8: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/8.jpg)
30/11/2010 ATLAS Software & Computing Workshop
8
Job Summary (2) http://dashb-atlas-jobdev.cern.ch/dashboard/request.py/jobsummary
![Page 9: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/9.jpg)
30/11/2010 ATLAS Software & Computing Workshop
9
Analysis Job Monitoring (1)
Collects and exposes a user-centric set of information to the user regarding submitted tasks.
Focused on the user's perspective.
Offers a wide selection of graphical plots.
User-driven development.
Provides a consistent way of following a user’s analysis jobs regardless of the submission tool.
Detailed information on twiki:
https://twiki.cern.ch/twiki/bin/view/ArdaGrid/TaskMonitoringWebUI
Analysis Job Monitoring “ web interface will be presented today on ” Distributed Analysis “ session by Jakub MOSCICKI
![Page 10: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/10.jpg)
30/11/2010 ATLAS Software & Computing Workshop
1010
Analysis Job monitoring (2)
meta information
http://dashb-atlas-jobdev.cern.ch/templates/client/index.html
Includes•Full bookmarking capability•Working 'refresh' capability•“Breadcrumbs” navigation element•Easy search•History support
“time period” selection for from-till and time range selection
![Page 11: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/11.jpg)
30/11/2010 ATLAS Software & Computing Workshop
11
Analysis Job monitoring (3)
Resubmission history
Link to the PanDA monitoring page for
each (panda) job
![Page 12: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/12.jpg)
30/11/2010 ATLAS Software & Computing Workshop
12
Historical views
Functionality Number of terminated, submitted, pending, running jobs Distribution of failed jobs by failure codes/reasons/categories CPU/Wall clock consumption, efficiency as cpu versus wallclock Processed events : number of processed events as a function of time, CPU/wallclock time spent on a
single event Resource utilization, number of used slots, efficiency of site usage compared to pledges Activities at the site. Single site view with job processing metrics . Data transfer distributions will be
added soon. All data can be filtered by site, activity, cloud Any time range can be selected Available granularities are hourly/daily/weekly/monthly All data is available in machine-readable format All plots are available via direct link
![Page 13: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/13.jpg)
30/11/2010 ATLAS Software & Computing Workshop
13
Terminated, Submitted, Pending, Running JobsClick to the appropriate button to create plot
http://dashb-atlas-job-dev.cern.ch/dashboard/request.py/dailysummary
Granularity:Hourly,
Daily,Monthly
13
Historical views
Time Range: 24 h, 48 h,
week, month, custom
Click on the plotTo zoom in and out
![Page 14: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/14.jpg)
30/11/2010 ATLAS Software & Computing Workshop
14
Status of Terminated jobs
Chosen parameters :
All T1 +T0
Time Range 48 hours
Granularity -hourly
Chosen parameters :
All T1 +T0
Time Range 48 hours
Granularity -hourly
Sorted by activities
(production jobs)
Click on the header of the plot to get links to machine-readable format or direct link to a plot
Historical views
![Page 15: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/15.jpg)
Failed/Aborted jobs. Error codes
Chosen parameters :
All T1 +T0
Time Range 48 hours
Granularity -hourly
2 kinds of failure:● application (transExitCode)
GRID (pilot, brokerage, ddm, jobDispatcher, supervisor, execution, taskBuffer)
Application failure should be grouped by component (e.g. site,user,application )
Historical views
![Page 16: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/16.jpg)
30/11/2010 ATLAS Software & Computing Workshop
16
Milestones and achievements
Monitoring plugin for ATLAS Ganga users publishes information about jobs to MSG since 17/05/2010
All ATLAS job monitoring data is collected in the common schema in real time since 26/09/2010
Aggregation procedures for feeding summary db tables setup since 8/10/2010
Interactive and accounting UI are available for ATLAS Community from 11/10/2010
Historical views now contain data imported from PanDA archive started from 1/01/2010.
![Page 17: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/17.jpg)
30/11/2010 ATLAS Software & Computing Workshop
17
Job Monitoring data starting from 1/01/2010
Historical views
![Page 18: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/18.jpg)
30/11/2010 ATLAS Software & Computing Workshop
18
![Page 19: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/19.jpg)
30/11/2010 ATLAS Software & Computing Workshop
19
Plans for the next year
Migrate DB the production server after validation by ATLAS (January 2011)
Improve performance of the Interactive UI (January 2011)
Add user interface for the production shifters (February-March 2011)
![Page 20: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/20.jpg)
30/11/2010 ATLAS Software & Computing Workshop
20
Effort and sustainability
Developers:n Laura Sargsyan (ATLAS)n Julia Andreeva (IT)n Edward Karavakis (IT)n Lukasz Kokoszkiewicz (IT)n Jakub Moscicki (IT) All applications (apart of data collectors, analysis users' web
interface) are shared with CMS. CERN IT ES provides support for these applications.
![Page 21: Global ADC Job Monitoring Laura Sargsyan (YerPhI).](https://reader035.fdocuments.us/reader035/viewer/2022062413/5a4d1b667f8b9ab0599b0ac6/html5/thumbnails/21.jpg)
30/11/2010 ATLAS Software & Computing Workshop
21
Waiting for your feedback
E-mail:[email protected]@cern.ch