Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial...
-
Upload
jocelyn-davis -
Category
Documents
-
view
220 -
download
0
description
Transcript of Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial...
![Page 1: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/1.jpg)
Your university or experiment logo here
Performance Monitoring
Gidon [email protected]
e-Science, HEP, Imperial College London
Talk to JRA1 All-Hands Meeting @ CERN
![Page 2: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/2.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Introduction
• How we gather data.• How we release the information.
– Real Time Monitor– LCG Load Monitor– Daily Reports– XML files and ROOT analysis
• Interesting metrics
![Page 3: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/3.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
How we gather data
• The data comes from direct queries of the mySQL databases of Resource Brokers.
• Around 30 Resource Brokers currently monitored.• Queries once a minute.
– find all jobs that had an event in the last minute– retrieve status and CE/WN information– write a complete (XML) description of all jobs– remove jobs that have finished status after 2 hours (or if Cleared)
– As a job is removed, query all events and write a summary file
• Multithreaded (one thread per RB) Java program.
![Page 4: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/4.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Current RB List
gdrb01.cern.ch lcgrb01.gridpp.rl.ac.ukrb01.pic.esgdrb02.cern.ch gfe01.hep.ph.ic.ac.uk rb-egee.bifi.unizar.esgdrb03.cern.ch egee-rb-01.cnaf.infn.itgrid09.lal.in2p3.frgdrb04.cern.ch egee-rb-02.cnaf.infn.itnode04.datagrid.cea.frgdrb06.cern.ch egee-rb-03.cnaf.infn.itmu3.matrix.sara.nlgdrb07.cern.ch gridit-rb-01.cnaf.infn.itrb.isabella.grnet.grgdrb08.cern.ch a01-004-127.gridka.derb101.grid.ucy.ac.cygdrb09.cern.ch grid-rb0.desy.degrid151.kfki.hugdrb10.cern.ch grid-rb2.desy.delcg16.sinp.msu.rugdrb11.cern.ch lcg00124.grid.sinica.edu.tw
rb.phy.bg.ac.yuui.ulakbim.gov.tr
![Page 5: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/5.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Real Time Monitor
• The Real Time Monitor has developed from a demo to show real timeusage of the LCG
• Further developmentwill include sortabletables of RB/CE info
• Java applet - doesnot require extralibraries
![Page 6: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/6.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
LCG Load Monitor
• Requested as a tool to monitor London Tier 2
• Java Application• Can monitor RBs,CEs, and groupsof CEs (eg a T2)
• Jobs colour codedby VO (stacked)
• Sortable table ofall current jobs
![Page 7: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/7.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Daily Reports
• PDF documents created automatically at 3am• Provides counts and metrics for all jobs that left the RTM in a 24 hour period
• Analysis split by– Resource Brokers– Virtual Organisation– Computing Element
• Metrics can identify problems• Data used to generate reports is available as a tab delimited plain text file on request
![Page 8: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/8.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
XML Files and ROOT
• Information from each RB is presented as an XML file
• For efficiency reasons the RTM and LCG Load programs use a single plain text file
• To see long term trends, the data is imported into ROOT. Graphs can then be made with larger data sets, and time dependent trends can be shown.
• We currently have data for half a year (from September 2005 - now)
• ROOT file available on request
![Page 9: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/9.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Interesting Metrics
• We can identify RB problems by looking at the match time for jobs. We have established that all RBs slow down with more than 10 jobs/second being submitted.
• We can show VO behaviour by average job lengths and success rates, as well as the usage of LCG components (RBs/CEs used) and the number of users (unique DNs).
• We can measure CE/VO efficiency by both the fraction of successful jobs AND by the amount of computational WN time that resulted in a Done (Success) state against the total time of all jobs (including those that failed) - labeled as “Useful Time”.
![Page 10: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/10.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
RB Match TimesJob scheduling (Match Time) versus load (mean number of jobs/sec
during the matching)
![Page 11: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/11.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
DNs over time / VO
We can see weekends, as well as relative users per VO
![Page 12: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/12.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
Useful TimeUseful time for those CEs that had more than 30000 jobssubmitted from September 2005 - February 2006 inclusive.
![Page 13: Your university or experiment logo here Performance Monitoring Gidon Moont e-Science, HEP, Imperial College London Talk to JRA1.](https://reader033.fdocuments.us/reader033/viewer/2022051123/5a4d1b127f8b9ab0599900fe/html5/thumbnails/13.jpg)
24 March 2006 Performance MonitoringYour university or experiment logo here
URLS etc.
http://gridportal.hep.ph.ic.ac.uk/rtm/