Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT...
-
Upload
ashley-mckenzie -
Category
Documents
-
view
212 -
download
0
Transcript of Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT...
![Page 1: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/1.jpg)
ExperimentSupport
Introduction to HammerCloud for The LHCb Experiment
Dan van der Ster
CERN IT Experiment Support
3 June 2010
![Page 2: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/2.jpg)
ExperimentSupport Outline
• Introduction to HammerCloud– Motivation, History, Use-Cases
• How HammerCloud works– Design and Implementation Details
• Interface Tour for Users and Admins
• Possibilities for an LHCb Plugin
HammerCloud Introduction for LHCb – 2
![Page 3: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/3.jpg)
ExperimentSupport Introduction to HammerCloud
• HammerCloud (HC) is a Distributed Analysis testing system serving two use-cases:– Robot-like Functional Testing: frequent “ping” jobs to all
sites to perform basic site validation– DA Stress Testing: on-demand large-scale stress tests
using real analysis jobs to test one or many sites simultaneously to:• Help commission new sites• Evaluate changes to site infrastructure• Evaluate SW changes• Compare site performances…
HammerCloud Introduction for LHCb – 3
![Page 4: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/4.jpg)
ExperimentSupport HammerCloud and Job Robots
• HammerCloud is part of an evolution of job robots:– CMS Job Robot inspired the ATLAS GangaRobot (functional testing)– In ~Sept 2008, a form of the ATLAS GangaRobot was used to
manually stress test the Italian ATLAS Tier2’s:• 5 users manually submitting hundreds of instrumented jobs simultaneously
(SIMD)• Manual results collection and summarization• Early results were shown to be very useful:
– One early test showed a bimodal performance plot that was later traced to a faulty network switch which negatively affected the performance of some WNs. The need for an automated DA stress testing system was clear.
– HammerCloud was born in November 2008 to deliver on-demand stress tests to ATLAS sites:
• Since then HC has run >1300 “Tests” using more than 4 million jobs.• ATLAS has invested >200k CPU-days in HC tests
– CMS has also agreed to use HC: in April a prototype was delivered, and now scale tests are about to begin.
HammerCloud Introduction for LHCb – 4
![Page 5: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/5.jpg)
ExperimentSupport HC and ATLAS during STEP’09
HammerCloud Introduction for LHCb – 5
STEP’09
![Page 6: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/6.jpg)
ExperimentSupport HammerCloud Use-Cases
• Provides On-Demand and Automated Testing
• HC Operators define test templates: FUNCTIONAL and STRESS
• Functional Tests are automatically scheduled
– Results are published on the HC website and can be pushed to other systems (e.g. SAM)
• Stress tests are generally scheduled on demand as needed by:
– Central VO managers– Cloud/Regional managers– Site managers
• For all tests, a detailed report summarizing the job success rates and performances is produced.
HammerCloud Introduction for LHCb – 6
![Page 7: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/7.jpg)
ExperimentSupport HammerCloud Components
• The HC UI is implemented as a Django web app:– View test results– View cloud/site evolution– DB Admin
• State is maintained in a MySQL DB
• HC Logic (job submission, monitoring, resubmission) implemented on top of the Ganga Grid Programming Interface (GPI)
HammerCloud Introduction for LHCb – 7
![Page 8: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/8.jpg)
ExperimentSupport HammerCloud Logic
• An HC Test is described by:– The analysis code to run (typically a real analysis from the user community)– The dataset pattern (which can be resolved to a set of datasets appropriate
for the analysis code)– The list of sites to be tested, and the target number of jobs to run
concurrently per site– A start time and an end time
• Test execution proceeds in 4 steps:– Generate: Test description is converted to a set of submittable jobs (e.g.
Ganga job objects, one for each site under test)– Submit: the job objects are submitted– Run: jobs are monitored, outputs recorded to the HC DB, jobs are
resubmitted to achieve the target number of running jobs per site– Exit: at the test end time, leftover jobs are killed
• Concurrently, the HC Web shows real time test results
HammerCloud Introduction for LHCb – 8
![Page 9: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/9.jpg)
ExperimentSupport An HC-LHCb Plugin
• What customizations would be needed for an HC-LHCb plugin?
• HC is built upon Ganga and exploits its job management features:– job repository, job configuration via
python, job submission, job monitoring in background thread(s)
• Given the existing GangaLHCb plugins, modifications to HC itself would be relatively minor, e.g.– HC Test Generation:
• Query a data discovery service to form a job processing random input data
– HC Test Running:• Changes to extract LHCb-specific job
metrics from Ganga
HammerCloud Introduction for LHCb – 9
![Page 10: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/10.jpg)
ExperimentSupport
Interface Tour
1. The Public User Interface
HammerCloud Introduction for LHCb – 10
![Page 11: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/11.jpg)
ExperimentSupport HC Home
• The HC Homepage lists the running and scheduled tests.
HammerCloud Introduction for LHCb – 11
![Page 12: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/12.jpg)
ExperimentSupport Viewing a Test
• The test overview gives a quick summary of: Overall job efficiency, CPU/Walltime, Events/WrapperTime
• Also shows a summary of the jobs running at each site involved in the test.
HammerCloud Introduction for LHCb – 12
![Page 13: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/13.jpg)
ExperimentSupport Viewing a Test: Summary Stats
• The Test Overview page also gives summary statistics by site• Here you can see some example metrics (for CMS)
HammerCloud Introduction for LHCb – 13
![Page 14: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/14.jpg)
ExperimentSupport Viewing a Test: Per-Site Plots
• View plots of the recorded metrics for each site
HammerCloud Introduction for LHCb – 14
![Page 15: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/15.jpg)
ExperimentSupport Viewing a Test: Metric Comparisons
• View the plots for all sites for a specific metric
• Used to compare site-by-site
HammerCloud Introduction for LHCb – 15
![Page 16: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/16.jpg)
ExperimentSupport Modify a Running Test
• Authorized users can modify the parameters of a test at run time– E.g. change the end time, or number of running jobs per site
HammerCloud Introduction for LHCb – 16
![Page 17: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/17.jpg)
ExperimentSupport Clone a Previous Test
• Cloning a previous test is simple– Useful to repeat the test or to run an identical test at a
different set of sites
HammerCloud Introduction for LHCb – 17
![Page 18: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/18.jpg)
ExperimentSupport Overall HC Plots
• Historical plots show previous test statistics• Currently shows # running jobs per site. Plots showing the
evolution of the performance metrics are in development.
HammerCloud Introduction for LHCb – 18
![Page 19: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/19.jpg)
ExperimentSupport HC Robot View
• The “Robot” view is used to show the success rates of functional test jobs over the past 24 hrs. (Similar to SSB)
• Clicking a site takes you to the list of Robot jobs executed at that site
HammerCloud Introduction for LHCb – 19
![Page 20: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/20.jpg)
ExperimentSupport
Interface Tour
2. Admin Interface
HammerCloud Introduction for LHCb – 20
![Page 21: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/21.jpg)
ExperimentSupport HC Admin: Operator and User Views
• HC Operators have access to admin all tables in the HC DB via a web interface
• HC Users have more limited access
HammerCloud Introduction for LHCb – 21
![Page 22: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/22.jpg)
ExperimentSupport HC Admin: Tests and Templates
Above: List all Test Templates Below: List all Tests
HammerCloud Introduction for LHCb – 22
![Page 23: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/23.jpg)
ExperimentSupport HC Admin: Edit a Test Template
• Test templates are defined via the Admin UI
• All of the parameters of a test are here, plus:– An active flag indicating that a
template should be auto-scheduled
– A default lifetime: auto-scheduled test instances of this template will run for this time period
• Normally, functional test templates include the list of sites to be tested, whereas stress test templates do not include a list of sites.
HammerCloud Introduction for LHCb – 23
![Page 24: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/24.jpg)
ExperimentSupport HC Admin: Adding a new Test
• Adding a new test on-demand is simple. Select the test template of interest, a start time, and an end time.
• If needed, Tests can be further customized after the template is copied over.
HammerCloud Introduction for LHCb – 24
![Page 25: Experiment Support Introduction to HammerCloud for The LHCb Experiment Dan van der Ster CERN IT Experiment Support 3 June 2010.](https://reader030.fdocuments.us/reader030/viewer/2022032523/56649d845503460f94a6b8f6/html5/thumbnails/25.jpg)
ExperimentSupport Summary
• HammerCloud is a DA functional and stress testing system used widely by ATLAS and coming soon for CMS
• Two basic use-cases:– Continuous stream of test jobs to measure site availability– Enable central managers to define standardized (stress)
tests, and empower site managers to invoke those tests on-demand.
• An HC-LHCb plugin would leverage the existing GangaLHCb work– A prototype plugin would not take significant effort
HammerCloud Introduction for LHCb – 25